19 research outputs found

    Improvement Energy Efficiency for a Hybrid Multibank Memory in Energy Critical Applications

    Get PDF
    High performance, low power multiprocessor/multibank memory system requires a compiler that provides efficient data partitioning and mapping procedures. This paper introduced two compiler techniques for the data mapping to multibank memory, since data mapping is still an open problem and needs a better solution. The multibank memory can be consisted of volatile and non-volatile memory components to support ultra-low powered wearable devices. This hybrid memory system including volatile and non-volatile memory components yields higher complexity to map data onto it. To efficiently solve this mapping problem, we formulate it to a simple decision problem. Based on the problem definition, we proposed two efficient algorithms to determine the placement of data to the multibank memory. The proposed techniques consider the characteristic of the non-volatile memory that its write operation consumes more energy than the same operation of a volatile memory even though it provides ultra-low operation power and nearly zero leakage current. The proposed technique solves this negative effect of non-volatile memory by using efficient data placement technique and hybrid memory architecture. In experimental section, the result shows that the proposed techniques improve energy saving up to 59.5% for the hybrid multibank memory architecture

    Instruction Re-Selection for Iterative Modulo Scheduling on High Performance Multi-Issue DSPs

    Get PDF
    An iterative modulo scheduling is very important for compilers targeting high performance multi-issue digital signal processors. This is because these processors are often severely limited by idle state functional units and thus the reduced idle units can have a positively significant impact on their performance. However, complex instructions, which are used in most recent DSPs such as mac, usually increase data dependence complexity, and such complex dependencies that exist in signal processing applications often restrict modulo scheduling freedom and therefore, become a limiting factor of the iterative modulo scheduler. In this work, we propose a technique that efficiently reselects instructions of an application loop code considering dependence complexity, which directly resolve the dependence constraint. That is specifically featured for accelerating software pipelining performance by minimizing length of intrinsic cyclic dependencies. To take advantage of this feature, few existing compilers support a loop unrolling based dependence relaxing technique, but only use them for some limited cases. This is mainly because the loop unrolling typically occurs an overhead of huge code size increment, and the iterative modulo scheduling with relaxed dependence techniques for general cases is an NP-hard problem that necessitates complex assignments of registers and functional units. Our technique uses a heuristic to efficiently handle this problem in pre-stage of iterative modulo scheduling without loop unrolling

    Malware Visualization and Similarity via Tracking Binary Execution Path

    Get PDF
    Today, computer systems are widely and importantly used throughout society, and malicious codes to take over the system and perform malicious actions are continuously being created and developed. These malicious codes are sometimes found in new forms, but in many cases they are modified from existing malicious codes. Since there are too many threatening malicious codes that are being continuously generated for human analysis, various studies to efficiently detect, classify, and analyze are essential. There are two main ways to analyze malicious code. First, static analysis is a technique to identify malicious behaviors by analyzing the structure of malicious codes or specific binary patterns at the code level. The second is a dynamic analysis technique that uses virtualization tools to build an environment in a virtual machine and executes malicious code to analyze malicious behavior. The method used to analyze malicious codes in this paper is a static analysis technique. Although there is a lot of information that can be obtained from dynamic analysis, there is a disadvantage that it can be analyzed normally only when the environment in which each malicious code is executed is matched. However, since the method proposed in this paper tracks and analyzes the execution stream of the code, static analysis is performed, but the effect of dynamic analysis can be expected.The core idea of this paper is to express the malicious code as a 25 25 pixel image using 25 API categories selected. The interaction and frequency of the API is made into a 25 25 pixel image based on a matrix using RGB values. When analyzing the malicious code, the Euclidean distance algorithm is applied to the generated image to measure the color similarity, and the similarity of the mutual malicious behavior is calculated based on the final Euclidean distance value. As a result, as a result of comparing the similarity calculated by the proposed method with the similarity calculated by the existing similarity calculation method, the similarity was calculated to be 5-10% higher on average. The method proposed in this study spends a lot of time deriving results because it analyzes, visualizes, and calculates the similarity of the visualized sample. Therefore, it takes a lot of time to analyze a huge number of malicious codes. A large amount of malware can be analyzed through follow-up studies, and improvements are needed to study the accuracy according to the size of the data set

    Preprocessing Strategy for Effective Modulo Scheduling on Multi-Issue Digital Signal Processors

    No full text
    Abstract. To achieve high resource utilization for multi-issue Digital Signal Processors (DSPs), production compilers commonly include variants of the iterative modulo scheduling algorithm. However, excessive cyclic data dependences, which exist in communication and media processing loops, often prevent the modulo scheduler from achieving ideal loop initiation intervals. As a result, replicated functional units in multi-issue DSPs are frequently underutilized. In response to this resource underutilization problem, this paper describes a compiler preprocessing strategy that capitalizes on two techniques for effective modulo scheduling, referred to as cloning1 and cloning2. The core of the proposed techniques lies in the direct relaxation of cyclic data dependences by exploiting functional units which are otherwise left unused. Since our preprocessing strategy requires neither code duplication nor additional hardware support, it is relatively easy to implement in DSP compilers. The strategy proposed has been validated by an implementation for a StarCore SC140 optimizing C compiler.

    Instruction Re-selection for Iterative Modulo Scheduling on High Performance Muti-issue DSPs

    No full text
    Abstract. An iterative modulo scheduling is more important for compilers targeting high performance multi-issue digital signal processors than ever before because these processors are often severely limited by idle state functional units due to dependence constraint and thus the reduced the idle units can have a positively significant impact on their performance. However, complex instructions, which are used in most recent DSPs such as mac, usually increase a data dependence complexity, and such complex dependencies that exist in signal processing applications often restrict modulo scheduling freedom and therefore, become a limiting factor of the iterative modulo scheduler. In this work, we propose a novel technique that efficiently reselect instructions of an application loop code considering with dependence complexity, which directly resolve the dependence constraint on a loop body, that are specifically featured for accelerating software pipelining performance by minimizing length of intrinsic cyclic dependencies in a loop code. To take advantage of this feature, few existing compilers support a loop unrolling based dependence relaxing technique, but only use them for some limited cases. This is mainly because the loop unrolling typically occurs an overhead of huge code size increment, and the iterative modulo scheduling with relaxed dependence techniques for general cases is an NP-hard problem that necessitates complex assignments of registers and functional units in resource reservation table [8]. Our technique uses a heuristic to efficiently handle this problem in pre-stage of iterative modulo scheduling without loop unrolling. key words: code generation and optimization, application specific embedded software design, software pipelining, dependence analysis, high performance DSPs

    General Terms

    No full text
    Many embedded array-intensive applications have irregular access patterns that are not amenable to static analysis for extraction of access patterns, and thus prevent efficient use of a Scratch Pad Memory (SPM) hierarchy for performance and power improvement. We present a profiling based strategy that generates a memory access trace which can be used to identify data elements with fine granularity that can profitably be placed in the SPMs to maximize performance and energy gains. We developed an entire toolchain that allows incorporation of the code required to profitably move data to SPMs; visualization of the extracted access pattern after profiling; and evaluation/exploration of the generated application code to steer mapping of data to the SPM to yield performance and energy benefits. We present a heuristic approach that efficiently exploits the SPM using the profiler-driven access pattern behaviors. Experimental results on EEMBC and other industrial codes obtained with our framework show that we are able to achieve 36 % energy reduction and reduce execution time by up to 22 % compared to a cache based system. Categories and Subject Descriptors D.3.3 [Programming Languages]: Language Constructs and Features—dynamic storage management; D.3.4 [Programming Languages]

    High Throughput Data Mapping for Coarse-Grained Reconfigurable Architectures

    No full text
    Coarse-grained reconfigurable arrays (CGRAs) are a very promising platform, providing both up to 10-100 MOps/mW of power efficiency and software programmability. However, this promise of CGRAs critically hinges on the effectiveness of application mapping onto CGRA platforms. While previous solutions have greatly improved the computation speed, they have largely ignored the impact of the local memory architecture on the achievable power and performance. This paper motivates the need for memory-aware application mapping for CGRAs, and proposes an effective solution for application mapping that considers the effects of various memory architecture parameters including the number of banks, local memory size, and the communication bandwidth between the local memory and the external main memory. Further we propose efficient methods to handle dependent data on a double-buffering local memory, which is necessary for recurrent loops. Our proposed solution achieves 59% reduction in the energy-delay product, which factors into about 47% and 22% reduction in the energy consumption and runtime, respectively, as compared to memory-unaware mapping for realistic local memory architectures. We also show that our scheme scales across a range of applications and memory parameters, and the runtime overhead of handling recurrent loops by our proposed methods can be less than 1%.close
    corecore